Human coding and computational text analysis are more powerful when combined. I offer a suite of exact methods that can increase the power of common hand-coding tasks by orders of magnitude. Human coding can both inform and be aided by rule-based information extraction, iteratively structuring queries on unstructured text.
Applying this method to public comments on U.S. Federal Agency rules, a sample of 10,894 hand-coded comments yields 41 million as-good-as-hand-coded comments regarding both the organizations that mobilized them and the extent to which policy changed in the direction they sought. This large sample enables new analyses of lobbying coalitions, social movements, and policy change.
Workflow: googlesheets4 allows analysis and improving data in real-time. For example, in Fig. 1:
Fig. 1: Coded Comments in a Google Sheet
| Entity | Pattern |
|---|---|
| 3M Co | 3M Co|3M Cogent|3M Health Information Systems|Ceradyne|Cogent Systems|Hybrivet Systems |
| Teamsters Union | Brotherhood of Locomotive Engineers & Trainmen|Brotherhood of Maint of Way Employ Div|New England Teamsters & Trucking Pension|Teamsters Airline Express Delivery Div|Teamsters Local 357|Teamsters Union|Western Conf of Teamsters Pension Trust |
Fig 2: Iteratively Building Regex Tables
For example, the legislators package adds variants (e.g., “AOC”) to standard legislator names.
Of 58 million comments on regulations.gov, the top 100 organizations mobilized 43,938,811 comments. The top ten organizations mobilized 25,947,612.
| Organization | Rules Lobbied On | Pressure Campaigns | Percent (Campaigns /Rules) | Comments | Average per Campaign |
|---|---|---|---|---|---|
| NRDC | 530 | 62 | 11.7% | 5,939,264 | 95795 |
| Sierra Club | 591 | 110 | 18.6% | 5,111,922 | 46472 |
| CREDO | 90 | 41 | 45.6% | 3,019,150 | 73638 |
| Environmental Defense Fund | 111 | 31 | 27.9% | 2,849,517 | 91920 |
| Center For Biological Diversity | 572 | 86 | 15.0% | 2,815,509 | 32738 |
| Earthjustice | 235 | 59 | 25.1% | 2,080,583 | 35264 |
Fig. 3:: Iteratively Group Documents
FIg 4: Identifying Groups of Linked Documents using Text Reuse (a 10-gram Window Function)
Fig. 5: Comments Posted to Regulations.gov
Comments that share a 10-gram with 99 or more others are part of a mass comment campaign.
Preprocessing tip:
Summaries speed hand-coding (e.g., textrank’s top 3 sentences).
Fig. 6: Lobbying Success by Campaign Size
Public pressure on climate and environmental justice greatly affected policy documents, but a few organizations dominate lobbying coalitions. When tribal governments or local groups lobby without the support of national advocacy groups, policymakers typically ignore them.
Fig. 7: Policy Text Change by Coalition Size
linkit, fastlink, ML with hand-coded training set)